# Multilingual VLM
Ristretto 3B
Apache-2.0
Ristretto is an innovative vision-language model that employs dynamic image token deployment technology, allowing flexible adjustment of image token quantities based on task requirements, surpassing previous generations in performance and versatility.
Image-to-Text
Transformers Supports Multiple Languages

R
LiAutoAD
732
2
Paligemma2 10b Pt 224
PaliGemma 2 is a vision-language model (VLM) that combines the capabilities of the Gemma 2 model. It can process both image and text inputs simultaneously and generate text outputs, supporting multiple languages. It is suitable for various vision-language tasks such as image and short video captioning, visual question answering, text reading, object detection, and object segmentation.
Image-to-Text
Transformers

P
google
3,362
8
Featured Recommended AI Models